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Abstract 

Researchers working with mathematical models are often confronted by the related problems of 
parameter estimation, model validation, and model selection. These are all optimization problems, well- 
known to be challenging due to non-linearity, non-convexity and multiple local optima. Furthermore, 
the challenges are compounded when only partial data is available. Here, we consider polynomial 
models (e.g., mass-action chemical reaction networks at steady state) and describe a framework for their 
analysis based on optimization using numerical algebraic geometry. Specifically, we use probability-one 
polynomial liomotopy continuation methods to compute all critical points of the objective function, then 
filter to recover the global optima. Our approach exploits the geometric structures relating models and 
data, and we demonstrate its utility on examples from cell signaling, synthetic biology, and epidemiology. 


1 Introduction 

Across the physical, biological, and social sciences, mathematical models are formulated and studied to 
better understand real-world phenomena. Often, multiple models are developed to explore alternate hy¬ 
potheses. It then becomes necessary to choose between different models, for example, based on their fit 
with noisy experimental data. This is the problem of model selection, a fundamental scientific problem with 
practical implications (1-3). 

The standard approach to model selection is to first estimate all model parameters and hidden variables 
from the data, then select the model with the smallest best-fit error, up to some penalty on model complexity 
(4, 5). Thus, at its core, model selection is intimately tied to parameter estimation, which, for a given model 
f(a,x) = 0 in the parameters a and variables x with measurable “outputs” y = g(x), say, can be written as 
the following optimization problem: 


min ||y — y '|| 2 s.t. 

a,x,y 


f f(a, x) = 0 

1 y = g(x) 


(i) 


where y' denotes the observed data, i.e. measured outputs. Unless / and g are convex, solving (1) is a 
nonconvex problem, which can be challenging as standard local solvers run the risk of getting trapped in 
local minima (especially in high dimensions). This can be mitigated somewhat with techniques such as 
simulated annealing (6, 7) or convex relaxation which has been successful for model invalidation (8-10), but 
there is generally no guarantee that a global minimum will be found. 

When / and g are polynomial, however, problem (1) can be solved globally by finding all roots of an 
associated polynomial system. In this case, ideas from computational algebra and algebraic geometry can 
be effective; see, e.g., (11-14) for applications of Grobner bases in systems biology and (15) for applications 
of algebraic geometry to statistical inference. Such symbolic methods tend to be computationally expensive, 


1 



which limits their use in practice. Thus, although they provide a solution in principle, new algorithms and 
techniques are yet desired. 

In this paper, we aim to fill this gap by proposing a framework for global parameter estimation for 
polynomial deterministic models using numerical algebraic geometry (NAG), a suite of tools for numerically 
approximating the solution sets of multivariate polynomial systems via adaptive multi-precision, probability- 
one polynomial homotopy continuation (16, 17). Our approach scales well in dimension relative to classical 
symbolic methods (18), and, while it comes with a higher computational cost than standard local opti¬ 
mization, it has a probability-one guarantee to recover the global optima, solving problem ( 1 ) in the strong 
sense. This allows us to reason rigorously about model fit and to address the related problems of model 
selection and parameter estimation from a maximum-likelihood perspective. 

We demonstrate our techniques on examples from biology, where polynomial models often arise as the 
steady-state descriptions of mass-action chemical reaction networks. Although some limitations remain, 
we believe that this work achieves its primary purpose of introducing NAG as a valuable complement to 
existing tools for model evaluation and analysis. Additionally, this paper highlights specific challenges that 
arise when using polynomial methods for model inference, such as dealing with positivity constraints and 
non-isolated solutions, and provides guidance for tackling these challenges. 

The remainder of the paper is organized as follows. In the next section, we state precisely the problems 
with which we are concerned: model validation, model selection, and parameter and hidden variable estima¬ 
tion. We then present NAG algorithms for solving each problem. Finally, we showcase our approach on a few 
examples including cell death activation, synthetic bio-circuits, HIV progression, and protein modification. 

2 Problem Statement 

Consider a model whose dynamics are described by the system of polynomial differential equations 

x = f(a,x) ( 2 ) 

where a = (ai,..., a*,) are parameters (e.g. rate constants in a deterministic mechanistic model, such as a 
chemical reaction network with mass-action kinetics), x = (aq,..., x n ) are variables, and / = (/i,..., f r ) 
are polynomials in x and a with measurable outputs y = g(x ) where y = {yi, ■ ■ ■ ,ym), m < n, and 
g = (< 71 ,... ,g m ) are polynomials in x. While the parameters ai ,,ctk are treated as fixed variables in 
our exposition, we separate them from x±,... ,x n to respect how such variables are treated differently in 
experimental and computational settings. 

In algebraic geometry a variety is a solution set of a system of polynomial equations; we use this 
terminology for our next two definitions. The real model variety is the solution set of the system 

/ = 0 

y — g(x) = 0 , that is, 

(V.m)r := {(a, x, y) 6 R fc+Il+m : f( a , x) = 0 , y - g(x) = 0 } 

corresponding to the steady states of the model. Now, consider, for simplicity, the case of a single data 
point y = (yi ,..., y m )■ (See SI Appendix for multiple data points.) The real data variety is then the affine 
linear space 

(Vz>)k := {(a, x, y) 6 R k+n+m : Vi = y u Vi = 1 ,..., m}, 

with dim(Vx>)R = k + n. We consider the case when the data includes some extrinsic (measurement) noise; 
we assume the errors {ci, on the observed data variables are uncorrelated random variables and each 

error e, is normally distributed with known variance (which can be obtained by instrument calibration). 

Using this geometric framework, the problems of (1) model validation, (2) model selection, and (3) pa¬ 
rameter estimation, can be described precisely in terms of the real algebraic varieties (Vm)r and (Vz>)r- 


(3) 

(4) 
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2.1 Problem 1: Model Validation 


For model validation, we want to determine whether a deterministic polynomial model M is compatible 
with the data according to a given significance level a. This is akin to asking whether the model is a “good 
fit” for the data. A natural goodness-of-fit statistic is: 

m / \2 

d 2 := min^ ——^— subject to (a, x , y) 6 (V»)r- (5) 

i =1 

When the variances differ, d 2 is the minimum squared weighted Euclidean distance between (VvOr and 
(Vd)r. When all variances are equal to one, the value of (5) is just the minimum squared distance. For the 
remainder of the text, we will assume the latter, knowing that we can simply rescale. 

Optimization problem 5 can be derived directly from the log-likelihood function 

log L(a, x,y) = J2\2 l0g 27RT * ~ V \a 2 

as demonstrated in Section 2.1 of the SI. This provides a bridge to standard statistical model selection tools 
such as AIC (19) and BIC (20). 

If the data are generated from the model A4 with normally distributed extrinsic noise, the distribution 
function of d 2 is dominated by that of the chi-squared distribution with m degrees of freedom, Xmi w h ere 
m is the number of measurable outputs. Thus, if p a is the upper a-percentile for X™, then Pr(d 2 > p a ) < 
Pi(U > p a ) = a where U ~ xL- We reject the model M as incompatible if d 2 > p a \ otherwise we say that 
the model A4 is compatible. 

If the real model and data varieties intersect, that is, (V»)r n (Vx>)r / 0, then d 2 = 0, and we also say 
that the model is compatible with the data. If there are restrictions on (a, x , y ), for example if all parameters 
and variables are required to be non-negative, then finding d 2 becomes a constrained optimization problem 
(see SI Appendix). 

2.2 Problem 2: Model Selection 

For model selection, we are given a set of models {Ati,..., M s } and want to determine the model of best 
fit. In this setting, we use the statistic d 2 to make a selection, either by choosing the model with the smallest 
goodness-of-fit statistic or by using d 2 in conjunction with a complexity penalty, similar to the Bayesian or 
Akaike information criteria (2). 

If the statistic d 2 evaluates to zero for all (or even multiple) models, then we are unable to make a 
selection between the models. This can be remedied by designing experiments that yield more informative 
data. For example, measuring more variables can reduce the dimension of (Vm)* H (Vx>)r; the most 
informative situation is when (Vm*)r FI (Vd)r = 0 for all models. This indeterminacy can also be resolved 
by taking multiple measurements and minimizing the joint squared distance (see SI Appendix). 

2.3 Problem 3: Parameter Estimation 

Parameter estimation can be achieved by finding the point ( a*,x*,y *) 6 (Vm)r that minimizes the value 
of (5). The point ( a*,x*,y *) is the maximum likelihood estimate under the given noise assumptions. The 
parameter estimate is then a*; the estimate of the hidden variables, x*\ and the estimate of the “de-noised” 
outputs, y*. Of course, if the data and model varieties intersect, there will be one or more (possibly infinite) 
choices for (a*,x*,y*). Otherwise, it is a matter of solving a polynomial system that yields the point(s) on 
(Vm)r nearest (Vx>)r. This is described in more detail in the next section. 


3 Geometry 

In each of the problems stated above, we seek to minimize the distance between (Vm)r and (Vx>)r or to 
find the intersection of these two sets. Standard methods for solving non-linear optimization problems are 
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local in nature, i.e., only guaranteed to converge to a local minimum which may or may not be the global 
minimum. However, using NAG, we find all local extrema over C, necessarily including the global minimum. 
Due to this global benefit, NAG has been used before in statistical inference in the field of algebraic statistics 
( 21 , 22 ). 

Let Vm £ C k+n+m be the (complex) Zariski closure of (V[m)r and Vv £ £, k + n + m be the (complex) 
Zariski closure of (Ed)r. We will refer to Vm and Vt> as the model variety and data variety , respectively. 

The problem of determining the intersection of Vm H Vjy is simply a matter of solving the polynomial 
system obtained by taking the union of the polynomials defining Vm and the polynomials defining Vd- 
This is handled directly by NAG. If the intersection is nonempty and positive-dimensional (complex curves, 
surfaces, etc), real points can be found using the polynomial liomotopy method described in (23), a method 
based on symbolic algorithms in (24), and, more classically, on the decision method in (25). 

The problem of finding the points on the two varieties nearest one another can also be stated in terms of 
a polynomial system, on which we then call NAG solvers to find solutions. A well-known necessary condition 
for local extrema is given by Lagrange multipliers. In the main text we assume that r + m = codim Vm ; 
however, when this is not the case, the number of equations can be reduced (see SI Appendix). 

Proposition 1 (Equations given by Lagrange multipliers). Let c = codim Vm and let f — (/{,..., /') = 0 
on a Zariski open set o/Vm- //(a, x, y) 6 (V_m)r is a local minimum of 

m 

XI (Vi -Vi ) 2 > ( 6 ) 

i= 1 


then there exists A := (Ai,..., A c ), such that (a,x,y, A) is a solution to the system 


/' = 0, 
y - g(x) = o 

AiV/( + ... A C V/' + 


= 0 . 


(7) 

( 8 ) 

(9) 


Solving this system with NAG will provide us with all local extrema of ( 6 ) over V», from which we may 
easily select the pair of nearest points. The geometry of zero sets of systems such as (7)-(9) are described 
in (26). 


3.1 Numerical Algebraic Geometry 

Given a polynomial system F consisting of r polynomials and N variables, NAG packages, such as Bert ini 
(17), PHCpack (27), H0M4PS-2.0 (28), use polynomial liomotopy continuation to provide probability-one 
numerical methods for finding approximations of all isolated complex solutions of F = 0 (points) as well 
as witness points on all positive-dimensional irreducible components of the solution set of F = 0. These 
methods are probability-one in that the computations include some randomization, and this randomization 
will yield theoretically correct results so long as the random numbers are not chosen from some measure 
zero set in the parameter spaces of potential choices (16, 29). 

If x 6 is a real solution of F = 0, it is either isolated among the complex solutions or it lies on a 
positive-dimensional complex irreducible component. In the former case, the methods of NAG will find x 
and recognize it as real. In the latter case, x can be difficult to uncover. However, for our purposes, we 
usually only need to verify the existence of a real solution. In this case, we can find witness points on all 
positive dimensional components and then use the procedure in Section 2.1 of (23) to verify the existence 
of real points. 


3.2 Algorithms 

We present three related algorithms to address model validation, model selection, and parameter estimation 
(see Fig. 1). 
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The aim of the first algorithm, Algorithm 1, is to find the pair of points that minimize the distance 
between (Vm)r and (Vx>)r. If (Vm)r H (Vz>)r = 0, then this is obtained by solving (7)-(9); otherwise, 
additional techniques are required. 

We note also simply that these techniques are indeed probability-one algorithms. The computations 
will necessarily be carried out in finite time and the steps proceed linearly (no loops), so the methods will 
necessarily finish. 

3.2.1 Algorithm 1: Model validation 

Input Model At, data V = {)/}, tolerance a 
Output yes or no 

1 If Vm n Vv = 0, goto Step 3. 

2 If dim(Vx fi Vv) > 0 and 
(Vm)r (~l (Vx»)k 7 ^ 0, return yes, 
else, goto Step 3. 

3 Find a pair ( 01 , 02 ) € (Vx)r x (Vd)r that minimizes ( 6 ) (using NAG software such as 

Bert ini or PHCpack). 

4 If ||zi — Z 2 II 2 < p a , return yes; else no. 


The computation of the intersection Vm fl Vv in Steps 1 and 2 can be determined in several ways. The 
simplest way is by considering dimensions: if dim(Vx) + dim(Vu) exceeds the ambient dimension, they 
will almost surely intersect. If the ambient dimension is larger than the sum of the variety dimensions, 
they typically do not intersect. To compute the intersection (or check to see if it is nonempty), one could 
substitute y — g{x) for ( 8 ) in the system (7)-(8). 

In Step 2 , if dim(VM fl Vv) — 0, then the intersection of the two varieties consists of finitely many points; 
the condition (Vai)r fl (Vd)r 7 ^ 0 indicates that at least one of the points is real, which is straightforward 
to determine. If dim(V» fl Vv) > 0, to check if (Va-OrG (Vx>)r 7 ^ 0, one needs more sophisticated methods, 
such as those in (23). 

To find the pair ( 2 : 1 , 02 ) in Step 3, one may solve the polynomial system (7)-(9). If there is a positive¬ 
dimensional set of (complex) extremal points, then the procedures in (23) could be used to determine if the 
set contains a real point. 

If there are constraints on the variables or parameters, for example, if we seek to minimize (6) over 
the non-negative part of (Vai)r, then the algorithm is updated as follows: if the dim(V» fl Vv) = 0 or 
Vm G Vv = 0 then the Karush-Kuhn-Tucker equations as described in Section 3.1 of the SI Appendix should 
be used, while if dim(V» fl Vv) > 0, or if there is a positive-dimensional set of extremal points at Step 3, 
then the algorithm should return possibly. There are methods (30-32) for finding real points, curves, and 
surfaces within complex components of dimension 2 or less, but little is known about higher dimensions. 

3.2.2 Algorithm 2: Model selection 

In this case, there are several competing models, each with its own polynomial system. The algorithm 
proceeds much as in Algorithm 1, but iterated for the several models under consideration. If a threshold a 
is provided, one should first reject models that do not adequately support the data (d 2 > p a ), then choose 
the model yielding the minimum value of d 2 (up to some complexity penalty). Various conclusions may be 
drawn, e.g., no model adequately fits the data or three models adequately fit the data and model Mi provides 
the best fit. 

3.2.3 Algorithm 3: Parameter estimation 

Again, this algorithm is similar to the first. The input consists of only one model M and data T>. It is 
assumed that there are unknown parameters and the goal is to find values of these parameters producing 
the best fit between M and V. The output of Steps 2 and 3 need to be adjusted appropriately. The output 
of Step 4 is simply 02 since there is no a to be used for rejection. The method also simultaneously estimates 
hidden/unknown variables and “de-noised” outputs. 
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3.2.4 Simple Example 


To illustrate Algorithm 1, consider a simple model with three variables x,y,z and three parameters a, 5, c 
satisfying 


x 2 y 2 z 2 

-h — H-= 1. 

a 2 b 2 c 2 


Let a = 0.1 and assume, for this example, that the variances on the errors are a 2 = 0.1 for i = 1,2,3. The 
model variety Vm is simply an ellipsoid (Fig. 2A) where a, b, c describe the principal axes. Suppose we know 
that a,b,c = 1. For the case when the outputs are x and y and x = 0, y = 0 and a = 0.1, Step 2 indicates 
that the data does fit the model, i.e. the model is compatible with the data. In this case, there are two real 
points in the 0-dimensional intersection Vm n V-d (see Fig. 2B). For the same set-up, but with data x = 0, 
Step 3 indicates that this data possibly fits this model. Since there is a positive-dimensional intersection 
(Fig. 2C), it is possibly compatible (depending on constraints imposed by the user). For the same model and 
a, but different data x = 1.7, y = 0, Step 4 yields points (1, 0, 0) and (1.7, 0, 0), so ||zi — Z 2 W 2 > 0.4605 = p a , 
and the model is rejected (Fig. 2D). However when the data are x = 1.01 ,y = 0, Algorithm 1 indicates 
model compatibility (Fig. 2E). Previous algebraic methods that required Grobner basis calculations would 
result in a zero set and thus those approaches are not useful here. 


4 Results 

We demonstrate our methods on problems in biomedicine: cell death activation, synthetic biology, epidemi¬ 
ology and multisite phosphorylation. Each of the forthcoming applications can be written as a mass-action 
chemical reaction network, which has the form x = f(a,x ), studied at steady state: f(a,x ) = 0 as in the 
problem statement. Throughout these real-world examples, we emphasize the pivotal computations in these 
methods, such as determining the dimension of the intersection Vm fl Vp and finding points in the intersec¬ 
tion (see Fig. IB). Our first two examples, cell death signaling and genetic toggle bio-circuits, demonstrate 
how to handle positive dimensional intersections using two different approaches, while the remaining two 
examples, HIV and MAP kinase signaling, highlight analysis of zero-dimensional and empty intersections. 
In the following examples, we are interested in results that can be interpreted biologically, therefore we 
restrict our analysis to non-negative real solutions. 

4.1 Cell death activation 

We demonstrate model compatibility (Algorithm 1) on an example from receptor-mediated programmed cell 
death, which is initiated by the activation of death receptors upon the detection of extracellular death ligands 
(33-36). We consider, in particular, the “cluster” model of (12), which was inspired by crystallographic 
data (37) and describes the recruitment of receptors by ligands into local self-activating clusters capable of 
bistability. 

The cluster model is a system of 3 degree-4 polynomials in the form of (2) in 3 variables (representing 
various receptor states) and 6 rate parameters, supplemented by ligand and receptor conservations (see SI 
Appendix). We assume that we can measure the total ligand and receptor concentrations, which may be 
considered experimental inputs, as well as the concentration of active receptors. We do not assume access 
to the concentrations of other individual receptor states nor to any of the rate parameters. 

A steady-state data point was simulated from the model with all parameters and initial concentrations 
drawn independently and identically from the log-normal distribution In 7V(0,4), then combined and cor¬ 
rupted with i.i.d. noise from A/”(0,0.1) to obtain y. The real model and real data varieties intersect in the 
positive orthant with a distance zero and hence the model is indeed compatible with the data. 

4.2 Synthetic biology and experimental design 

We demonstrate an example from synthetic biology with excess intersection (dim(V>i fl Vt>) > 0). A goal in 
synthetic biology is to design or modify existing biological systems with new features according to specific 
design criteria. Reverse engineering of biological systems often includes modules (such as feedback loops) and 
how these are interconnected are described by different circuits (models). Understanding differences between 



bio-circuit implementations is crucial; therefore we compare three bistable bio-circuits models analyzed in 
(38): monomer-dimer toggle circuit (.Mi), dimer-dimer toggle circuit (.M2), and single operator gene circuit 
(.M3), which were initially presented in (39-41). The model variables include genes (X. t ) and proteins (Pi) 
where i = 1 for .M3 or j = 1,2 for Mi, M 2 , as well as these species complexes (e.g., P,P,., XjP, P, ). These 
variables interact following mass-action kinetics and form systems of polynomial differential equations where 
Mi,M 2 , and M 3 have 7, 8 and 6 model variables, respectively, and 10 , 12 , and 9 kinetic parameters, 
respectively. The models can be reduced (given in SI Appendix) by assuming that the total amount of gene 
1 (X ltot ) and gene 2 (X 2tot ) is conserved. 

Suppose that the total amounts Xi tot and X 2 tot and specific protein synthesis and degradation parame¬ 
ters kbas !, kba S2 > kdegi , and kdeg 2 are known. Since protein concentrations are often measurable, we assume 
that our data are Pi, P 2 , and their complexes Pi Pi, and P 2 P 2 . The aim is to select the best model Mi, 
M 2 , and M 3 given the data. We simulate steady-state data from the dimer-dimer toggle model (M 2 ) and 
add Gaussian noise from _/V(0, 0.1). We find that all three have positive dimensional intersections, where 
the dimension of the intersections are: 


diin((V» 1 )is H (Vd)r) = 3, 
dim((V» 2 )R H (Vx>)r) = 4, 
dim((V» 3 )R fl (Vd)r) = 3. 

Clearly, all three models are compatible with the data, thus one can only select a “best fit” model using 
data-independent measures, e.g. number of parameters, dimension, etc. 

In fact, the dimensions of the model and data varieties can help us design more informative experiments 
for model selection. For example, the variety associated to the monomer-dimer toggle model lives in a 15 
dimensional ambient space, Vx, C C 15 , and has dimension 6 , while the data variety Vx> 1= C 15 has dimension 
12; thus by the Dimension Theorem, we can determine that dim(V.M i nV'D) > 6+12—15 = 3. This dimension 
calculation provides guidance towards the number of minimal additional variable and parameter values that 
must be measured to ensure Vmi H V-d = 0, he., at least four more variables and parameter values must be 
known. 

Suppose we can experimentally measure four forward biochemical reaction rate constants (e.g., k c F, kkF, 
k n F, and kki1), then is cut down by four dimensions and does not intersect V-d- We get similar results 
for the model varieties associated to M 2 and M 3 provided we measure rate constants specific to these 
models (see SI Appendix). Now that all the intersections are empty, we run Algorithm 2 and find that the 
sum of squares (Eq. (5)) for each model are as follows: d\ = 0.2262, d\ = 7.34 x 10“', and = 0.3040. 
Therefore, we select the M 2 model, which is indeed the true model. 

4.3 HIV Progression 

We demonstrate parameter estimation (Algorithm 3) on an example coming from epidemiology. We use a 
model that includes long-term HIV dynamics from initial viremia, latency, and virus increase (42), based on 
(43). In the model (see SI Appendix), the HIV virus inhibits the CD4+T cell population while promoting 
macrophage proliferation, which in turn houses the replicating virus. As macrophages proliferate, the virus 
reservoir increases, so the model offers a description of HIV patient progression to AIDS. Model variables 
x are uninfected CD4+T cells (T), infected CD4 + T cells (T)), uninfected macrophages (M), infected 
macrophages (Mi), and HIV virus population (V). 

Hernandez-Vargas et al. show that the model can have two real equilibria, one of which is stable, repre¬ 
senting patients that are “long-term non-progressors” (42). The parameters a are (si, s 2 , /+,..., he, Si,..., (5s), 
where Si represents synthesis of T cells and macrophages, k are rate constants describing interactions be¬ 
tween variables x, and Si represents natural death. For this example, y = x. We estimated the natural 
death of the virus, parameter 65 , using the long-term non-progressors steady-state value (Table 3 of (42)) 
and adding noise to each variable ~ A/”(0,1). By Algorithm 3, the data variety and model variety do not 
intersect. We find the closest point and estimate <5g = 2.99876 (true value of 65 = 3). 
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4.4 Multisite phosphorylation with experimental data 

We examine phosphorylation mechanisms of cellular signaling with experimental data, and demonstrate 
model selection (Algorithm 2) and parameter estimation (Algorithm 3). We focus on phosphorylation, 
a key cellular regulatory mechanism that has been the subject of extensive study, both experimentally 
and theoretically ((44) and references therein). An area of interest is the mechanism by which a kinase 
phosphorylates a two-site substrate, either distributively, where the kinase can add at most one phosphate 
before dissociating, or processively, where it can add both phosphates in sequence. The MAPK/ERK 
pathway is a well-known system for studying phosphorylation, whereby MEK (kinase) phosphorylates ERK 
(its substrate). Aoki et al. (45) showed experimental evidence, while working with polynomial models, that 
the mammalian MAPK/ERK pathway acts distributively in vitro but processively in vivo. 

We compare these distributive and processive models against the in vivo data reported in the same 
study. The distributive model consists of 12 molecular species and 17 mass-action reactions, while the 
processive model has 14 species and 18 reactions (species correspond to variables, each reaction corresponds 
to a parameter). The data take the form of 36 concentration measurements of three aggregate phosphoforms 
over a range of 12 EGF stimulation levels. All model parameters are calibrated using in vitro estimates 
by (45), except the parameter fcj representing EGF loading, which we estimate for both models (see SI 
Appendix). 

Next we perform model selection by running Algorithm 2 on each data point individually and select 
independently for each run the preferred model (more details in SI Appendix). Under low EGF stimula¬ 
tion, the best model estimates are nearly identical with a slight preference for distributive. At high EGF 
stimulation, the models are identical with no preference for one model over the other. These results can be 
justified by noting that the main distinction between distributivity and processivity is nonlinear switching 
behavior (i.e., a sigmoidal response curve), and this only occurs at intermediate stimulations. However, at 
medium EGF stimulation (see Fig 3), there is a preference for the processive model, which supports the 
findings in (45). 


5 Conclusion 

The problem of determining whether given real-world data fits one or more given mathematical models is 
challenging. When a model is defined by algebraic (polynomial) functions, the methods of NAG may be 
employed to study the geometry underlying the model and data. In particular, these methods are useful 
for model variables observed at steady-state. We demonstrated this numerical and geometric framework 
for comparing models with experimental dose-response data in MAPK/ERK pathway and highlighted that 
the intermediate EGF doses were the most informative for model selection, complementing another Ending 
that model selection results can be very sensitive to experimental parameters (46). 

Despite the difficulties associated with positive dimensional components, and limitations in analysis, 
we can reproduce compatible models, and furthermore can predict additional information, such as mea¬ 
surements of parameters, that are necessary for selecting models. Our geometric investigations of positive 
dimensional components may perhaps relate to algebraic analyses for biochemical models, such as model 
identifiability or matroids for experimental design (14, 47, 48). 

There are further directions to be considered in this vein, aside from making the existing computational 
methods more efficient. First, there would be great value in developing strictly real geometric methods for 
solving polynomial systems such as those that appear in this article. Some such techniques exist, but only 
in very special circumstances. Second, there would be much value in developing effective numerical methods 
for treating inequalities. It should be noted that the methods described in (49) and the references therein 
will incorporate such constraints, though the cost of such computations restricts their use to relatively low 
dimensions. Finally, there is likely much to be gained from considering the geometry underlying models not 
defined by algebraic functions. Algebraic geometry provides very clean, well-understood structures, paving 
the way for numerical methods. Differential geometry or topology could lead to similarly useful techniques 
for model selection and parameter estimation. 



6 Materials and Methods 

6.1 Numerical algebraic geometry 

General references for NAG include (16, 29), with the latter doubling as a user manual for the software 
package Bertini. For computations, we used Bertini 1.4 and Macaulay2 version 1.6. 

6.2 Data generation 

Data simulated from cell death activation, synthetic biology and HIV models were performed in MATLAB 
R2014b using odel5s. 
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Figure SI: Schematic of numerical algebraic geometry framework corresponding to Algorithms I—III. (A) 
Input to algorithms include model (system of polynomials) translated into a model variety (red), and steady- 
state data translated into a data variety (blue). (B) Flow chart of model compatibility, parameter estimation 
and model selection methods. Examples (green) are described in Results section. 
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1 Geometry 

In this section we briefly review the some of the fundamental geometric concepts needed for the methods 
in the main text. 


1.1 Numerical algebraic geometry for isolated solutions 

Numerical algebraic geometry refers to the use of numerical methods, particularly honrotopy continuation- 
based methods, to compute approximations to solutions of polynomial systems. In other words, given a 
polynomial system F : C N —> C n with n equations in N variables, numerical algebraic geometry seeks to 
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find numerical approximations of all z 6 C N such that F(z) = 0. It may be the case that the solution 
set of F has infinitely many such points (curves, surfaces, etc.), in which case the data structure that 
encodes the solutions is called a witness set. See the description of the numerical irreducible decomposition 
below for more on this. For simplicity, we assume in this section that N = n. The books [1, 2] and the 
references therein provide much more detailed explanations than those included in this section, from more 
mathematical and computational perspectives. 

The core technique for most numerical algebraic geometry algorithms is homotopy continuation. The idea 
of homotopy continuation is to cast the polynomial system to be solved, say F : C N —> C N , as a member of a 
parameterized family of polynomial systems, H : x C —> C N , called a homotopy H(z,t) with parameter 

t 6 C, that includes one polynomial system G : C N —> C N that is easily solved and has the special property 
of having “enough” isolated solutions. In this document, we use the Bertini standard assumption that 
H(z, 1) = G(z) and H(z , 0) = F(z), i.e., t marches from 1 to 0. There are several canonical options for the 
construction of such a homotopy, and the reader is encouraged to consult the general references above and 
the references therein for further details. 

As t varies from 1 to 0, some results from algebraic geometry tell us that the solutions of the polynomial 
system H(z,t.) = 0 vary continuously and generally stay distinct until t = 0, where they may converge 
to solutions of F(z) = 0 or diverge. More specifically, there is a measure zero subset of f 6 C, meaning 
a finite set of points in this particular parameter space, over which two or more solutions coalesce. Such 
occurrences are thus probability zero events and, furthermore, can be detected on the fly and avoided. Said 
more technically, there is a Zariski open, dense set of the parameter space above which the solution set is 
finite and consists of a fixed number of solutions. Here, “Zariski” refers to the Zariski topology , for which 
basic open sets are the complements of solution sets of polynomial systems. 

In practice, t is moved in discrete increments, not continuously. For each solution at t = 1, a path 
of solutions is tracked using numerical predictor/corrector methods as t advances to 0. Implementations 
typically utilize adaptive step lengths and adapative precision. There are far too many details about this 
procedure to give a thorough explanation here. Instead, refer to the references above (and those therein) 
for further details. 

Ultimately, the output of this procedure is a superset of numerical approximations of the isolated solu¬ 
tions of F(z) = 0, possibly including approximations to points lying on positive-dimensional components, 
if any. It is important to note that this procedure necessarily works over C and finds all complex solutions. 
Real solutions could be buried somewhere within the complex solutions, and it is particularly difficult to 
extract these outside of the zero-dimensional case. However, methods do exist to extract such a real point 
[3-5]; here we use the method of [5], which is guaranteed (with probability one) to minimize the distance of 
a prescribed real point to each real connected component of the variety defined by F(z) = 0. 

1.2 Numerical algebraic geometry for positive-dimensional solution sets 

For solution sets of positive dimension (curves, surfaces, etc.), there is an extension of homotopy continuation 
referred to as the numerical irreducible decomposition (NID). As opposed to the case of systems of linear 
equations (at most one solution component of one dimension), there may be many components of many 
different dimensions. For example, one solution set might consist of seven components of dimension four, 
five surfaces, three curves and 15 isolated points. Furthermore, components may be singular, meaning that 
the Jacobian matrix is rank-deficient throughout the component. 

Technical definitions of dimension, irreducible component, and the like go a bit beyond the scope of 
this paper. It is enough to know that each “piece” of a solution set has a fixed dimension (e.g., a curve 
has dimension one, a surface two, etc.) and by the dimension of a solution set of a polynomial system of 
equations, we mean the maximum of the dimensions of the irreducible components. 

The numerical irreducible decomposition of a solution set of a polynomial system consists of a catalog 
of the dimensions and degrees of each irreducible component, along with a set of witness points on each 
component. By degree, we mean the number of points in the intersection of a component with a randomly- 
chosen affine linear space of complementary dimension. The witness points on a component are then exactly 
these points (and thus depend on the choice of linear space). One fundamental result from algebraic 
geometry is that an irreducible component will almost always intersect a complementary-dimensional linear 
space exactly in a set of points and that the number of points is the same for almost any choice of linear 
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space. Again, this can be stated as a probability one guarantee or with Zariski open, dense sets, but we 
choose not to be that technical here. 

1.3 Software for numerical algebraic geometry 

Various numerical algebraic geometry software packages have been produced over the years. Currently, 
there are three main options: PHCpack [6], HOM4PS-3 [7], and Bertini [8], each with their own benefits 
and drawbacks. In this article, we used Bertini exclusively. 


1.4 Geometry specific to presented algorithms 

Theorem 1 of the main text should be familiar to those trained in multivariate calculus; this is essentially 
the method of Lagrange multipliers. Geometrically, this system of equations forces the normal directions 
of the objective function and the constraints to line up in the same direction (up to some scalar(s), the 
Lagrange multiplier(s)). 

When working with an irreducible component of the solution set of a system of polynomial equations, 
it is often useful to deal with a complete intersection. Said simply, the idea is that computations can be 
more difficult if there are more equations than necessary. To see why this might be true, let us consider an 
example from linear algebra. Suppose we have a single linear equation in three variables, defining a plane. 
Now suppose we consider a system of two equations consisting of that equation and twice that equation 
(having the same solution set). Then the matrix of coefficients of this linear system is not full rank (not 
a desirable situation) and we have two equations defining a geometric object that could be described by a 
single equation. 

In the nonlinear setting, the situation is quite similar. Having “too many” equations leads to an un¬ 
desirable rank-deficient Jacobian matrix. Suppose polynomial system F : C N —> C" has an irreducible 
component X of dimension N — m (codimension m). Then, again with probability one, the polynomial 
system A ■ F has X as an irreducible component but has the “correct” number of equations, where A is 
a random constant matrix with n columns and m rows. Here, “correct” means the number of equations 
matches to codimension m. We will refer to this method as squaring up. 

Finally for this section specific to the algorithms developed in the main text, we require the user to check 
two geometric facts, the meaning of which may not be entirely clear. 

1. Vm H Vv refers to the intersection of the model and data varieties, as defined in the main text. To 
find the intersection of two solution sets, it is sufficient to simply solve the system consisting of all 
equations appearing in the systems for Vm and Vv, i.e., the union of those two polynomial systems. 
There are more sophisticated methods, but this is sufficient. 

2 . The user must determine whether the intersection of (Vm)r and (Vm)r is empty. By this, we simply 
mean that one should search for real points in the intersection just described, e.g., using the method 
of [5]. 

2 Choice of test statistics and parameter estimates 

In this section, we justify our procedures for model validation and parameter estimation. 

2.1 Maximum likelihood 

Here we justify the assertion that the test statistic given in the main text is related to likelihood maximiza¬ 
tion. 

Consider first, for simplicity, the case of a single data point y = (yi ,..., y m ), which we assume is a 
perturbation y = £ + e of some unknown true value £ = (£i,..., £ m ), where each component e; of the error 
e = (ei,..., Cm) is an independent zero-mean Gaussian random variable with variance of. We are interested 
in computing the probability that y comes from a given model as defined by a model variety Vm- A point 
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on Vm has the form (a, x, y), where a = (ai,..., at) are the model parameter values, x = (aq,..., x n ) are 
the variable values, and y = (y i,... ,y n ) are the outputs.The probability that y comes from a given point 
(a, x, y) 6 V», he., that y is a perturbation of y where (a, x, y) € Vm for some a and x , is then 


Pr {y | a, 2/) = Pi '{y U = V) = II p r(y* I & = Vi)- 

i= 1 

This is also called the likelihood L(a, x, y) of (a, x , y) and we wish to find its maximizer over all (a, x, y) 6 Vm ■ 
This can equivalently be done by considering the log-likelihood, which gives 

m m /I (v -v l 2 \ 

log L(a, x, y) = ^ log Pr(^ | & = Vi) *5 ^ ( - log 2iraf - 1 ^ ) 

i= 1 i= 1 ' i / 

by normality. The maximizer ( a*,x*,y *) can therefore be found by solving the optimization problem 


d 2 


m , A ,2 

(y% - Vi) 

mm > -s-, 

(a,x,y)eV M ' (if 

1=1 1 


where the optimum is precisely the test statistic. The values a*, x*, and y* are the maximum likelihood esti¬ 
mates for, respectively, the parameters (estimation), the unobservable variable values (inference/recovery), 
and the true output values (filtering/denoising), 

The test statistic d 2 itself also has a useful interpretation as follows. Suppose that y comes from a point 
(■ a,x,y ) 6 Vm- Then 


rf2 = E 


(m 





a'a - Vi ) 2 

rr 2 


by definition. But regarding each as a random variable, each term (j)i — yi)/cji in the summation above is 
standard normal. Hence the right-hand side has a chi-squared distribution with m degrees of freedom (Xm)- 
The inequality should be interpreted by regarding ct 2 as a random variable subject to the same source of 
randomness. This can be written somewhat clearer as 

i= 1 a i 

where we have made explicit the underlying dependence of both sides on the same random realization w. 
The inequality then holds for each value of w. Consequently, we conclude that 

Pr(d 2 < u) > Pr (U < «), U ~ xL 


SO 


Pr(d 2 > p a ) < Pr(t/ > p a ) = a, U ~ Xm> 

where p a is the upper a-percentile for Xm- This can be used to test the hypothesis that y comes from Vm- 
The test statistic is also related to the log-maximum-likelihood as 

log L(a*,x*,y*) = ^ ^mlog27r + ^ logcr 2 - d 2 ^j , 

which is a useful quantity for model selection via, e.g., the Akaike or Bayesian information criteria. 

Now consider the case of multiple data points {j/^}? =1 . As before, we assume that each y^ = 

{ili\---,ym) is a perturbation y^ where each eP is an independent zero-mean Gaus¬ 

sian random variable with variance aj Instead of searching for one point on Vm , we now have to search 
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for p points (a, x^\y^) for j == 1 ,... ,p all with the same parameter values (since they come from the same 
fixed model realization). The probability that y b) comes from (a,x^\y^) for j = 1,... ,p is then 


Pr(y (1) ,.. ■ ,y ip) | a,x (1) ,?/ (1) ,.. .,x (p \y (p) ) = Pr(i/ W | = y W ) = nn pi '(%. i f. 01 =»!”> 

3= 1 3 = 1 1=1 

= L(o,a; ( 1 ) ,j/ ( 1 ) ,...,x (p) ,y (p) ) 


by independence. This is essentially the same as before except that we now loop over each coordinate of each 
data point. Therefore, the maximum likelihood estimates (a*,x(P , j/b ,... ,x^ , j/ p ) ) can be obtained 
by solving 


d 2 = 


(a,: 


;(!),. uU), 


X (P) 


yy 


(vP-vh 2 


The same arguments go through and we find that Pr(d 2 > p a ) < a for p a the upper a-percentile for Xmp 
The log-maximum-likelihood is related to d 2 as 

pm \ 

2 *+EE l °z a li-y) ■ 

3=1i =1 / 


log L(a*,xW\yW, 


,xM\y ip) * 


pm log 


3 Algorithm modifications 

The algorithms presented in the main text are in their simplest form. Some applications require modifica¬ 
tions, particularly if there are constraints on the variables or parameters. 


3.1 Solving the constrained optimization problem 

In many common settings, there exist constraints on the variable and parameter spaces. For example, in 
chemical reaction networks, the rate parameters are all assumed to be positive. Thus, when positivity or 
other constraints are present, instead of finding the weighted squared distance between two varieties, we 
are finding the weighted squared distance between two semi-algebraic sets , i.e. sets defined by polynomial 
equalities and inequalities as opposed to just polynomial equalities. Indeed, if we let Sm C (V»)r denote 
the semi-algebraic set associated to the model, e.g. Sm = V m nl>Q“ + ”‘, then the appropriate statistic is: 

kk, (y. _ y .) 2 

d 2 = min^J —-—^— subject to (a,x,y) 6 Sm- 

i =l 

In the case when a bound on the statistic d 2 is sufficient, then no additional work is needed. One can 
solve the system from Proposition 1, keeping in mind that the weighted squared distance between the closest 
pairs of points returned would be a lower bound on d 2 . If the closest point to (Vx>)r in (V»)m is also an 
element of Sm> then the squared distance would be exactly the statistic d 2 . 

When the exact value of d 2 is needed, then one should solve the Karush-Kuhn-Tucker (KKT) system 
of equations. Let F-],. F r , hi,..., h s be polynomials in the ring K[a l5 ..., a*,, X \,..., x n , yi ,..., y m \. Let 
Sm be the semi-algebraic set of all (a, x, y) 6 R fc +"+ m that satisfies 

Ft (a, x, y) ,s= 0 for * = 1 ,..., r 

hi ( a , x, y) < 0 for * = 1,..., s 
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Let Ai,..., A r , fJti, ■ ■ ■, fJ-s be indeterminates (these are the KKT multipliers). The KKT system is 


/ = 0 ( 1 ) 

A 1 V /1 + ... A r V/ r + AiiV/H + • • • y s X7h s + ^ ^ ~ ^ = 0 (2) 

Hihi = 0 (3) 

: (4) 

Lhh s = 0. (5) 


In order for (a, a:, y) to be a critical point, the point (a, x, y) must be a solution to this system and satisfy 
the inequalities hi < 0 for i — 1,..., s and Pi > 0 for i = 1,..., s. Thus, we can find the global minimum 
by using numerical algebraic geometry to solve the system defined by equations (1) - (5), then filtering the 
solutions appropriately. This combination of numerical algebraic geometry and the KKT equations was first 
employed in [9]. 

Alternatively, in some situations, it may be more efficient to minimize the objective function over (Vm)r 
and then check the boundary of Sm ■ We describe this method for when there are non-negative constraints 
on all indeterminates. After solving the constrained optimization problem on (V»)r, we assume, in an 
orderly fashion, that one of the indeterminates, say Xi, is zero. In this case, we are required to minimize 
the distance of Vm FI {xi = 0} to Vx>- This involves solving a smaller and related constrained optimization 
problem. In total, if there are N indeterminates, there are 2 N — 1 combinations to set to zero. This amount 
to solving 2^ — 1 Lagrange multiplier systems. 

One observation is that the number of systems that need to be solved explodes when N is large. Note 
however that the dimension of Vm LI {Xj = 0} is less than or equal to the dimension of V», with the 
inequality being strict when Vm Z { x i = 0}- Geometrically, there are no longer degrees of freedom in the 
variable Xi so the dimension is reduced. Furthermore, we can expect the dimension to be reduced by one. 
As we impose additional constrains, Xj = 0 for i j, the dimension may drop further. 

In practice, there are diminishing returns as you begin setting Xi to zero. That is, there exists some 
positive integer k such that Vm FI {x^ = ... = Xi k = 0} is zero dimensional for some indexing set *!<•••< 
ik- In this case, measuring the distance of Vm FI {x^ = . ,. ==ij, = 0} to Vv using a Lagrange multiplier 
method is unnecessary, and, as we continue imposing additional constraints on V», either the intersection 
is non-empty and every observable variable is set to zero, or the model variety becomes empty. Thus, it 
becomes redundant or unnecessary to check additional cases. 

For example, in Section 4.4.3, the MAPK model variety Vm is one-dimensional and Vm FI {x, = 0} 
becomes zero-dimensional for each i. For the cases where x$ = x.j = 0 for i ^ j , the intersection of the 
corresponding linear spaces with Vm becomes empty. Even though there are 2 1G — 1 = 65, 535 boundary 
cases to check, there are really only 16 relevant cases. See Section 4.4.3 for more explicit details on how this 
calculation carried out. 

3.2 Removing extinction components 

Given a model, it is quite common that the model variety is not irreducible but instead is the union of several 
irreducible components. In applications, it may be preferred to remove from consideration components that 
lie entirely in a coordinate hyperplane, since, in such components, one or several of the parameters and/or 
variables are equal to zero. For example, in a chemical reaction network, such a component is called an 
extinction component [10] since it captures the situation where one or more of the reactants have “run out.” 
It is common to want to avoid extinction components when estimating parameters. 

Removing components where a parameter or variable is equal to zero throughout the set from consider¬ 
ation can be done algebraically with saturation. In particular, if Im is the dehning ideal of the model Vm 
one should compute 

l-Main — -I M • (^1 ' ' ' ^k * x l ' ' ' X n IJl Vm) • — 

{/ G R[ai,... ,afc,xi,... ,x n ,yi,... ,y m \ ■ 3/c 6 N s.t. (01 • • • a*, • xi • • • x„ • yi ■ ■ ■ y m ) k f G Im}- 
This procedure can be performed using the saturate command in Macaulay2. 



To estimate parameters such that the best estimate corresponds to a point not on an extinction compo¬ 
nent of Vmi one should modify Algorithm 3, replacing Vm with V(lMain)- 


4 Results 

We provide details for the calculations of examples in the main text. All code is available at: 

http: //www.math, sjsu. edu/~egross/NAGModelSelection/AuxillaryFiles. 

4.1 Cell death activation 

We provide the details of the calculations for model compatibility of the cell death cluster model. This 
subsection includes detailed information regarding the solving schemes available in NAG software that were 
utilized. A summary for the practioner can be found at the end of the subsection. 

The model describes activation of apoptosis by death receptor Fas mechanisms [11]. The model includes 
constitutive receptor opening and closing, pairwise open Fas stabilization, higher-order open Fas stabilization 
enabled by FasL, and ligand-induced receptor opening. According to its conformational states, Fas is 
assumed to be one of three species: closed (X-j): open, unstable (X 2 ); and open, stable (X 3 ), i.e., active 
and signaling. Furthermore, let the ligand FasL be denoted by L. Then the model has the reactions 


X 2 ^ 

X u 

k 0 


* 

CO 

Jf 

X'2 , 

k « 


j)X3 

► U ~ k)X 2 + (i-j + k)X 3 , 

L + jX 2 +(i-j)X 3 ^~. 

> U - k)x 2 + (* - j + k)x 3 


for i 6 {2, 3}, j = 1,... ,i, and k — 1,... ,j. The first reaction describes spontaneous receptor opening and 
closing. The second reaction describes constitutive destabilization of open Fas. The third reaction describes 
cluster-stabilization by open Fas, independent of the presence of FasL. The fourth reaction describes cluster- 
stabilization events enabled by FasL. 

Assuming mass-action kinetics, the reactions can be translated as follows: 


{ Xl = ~Vi, 

x 2 = v\ + v 2 — v 3 — v 4 , where 

*3 =V 3 + V4~V2, 


Vl 

V2 

V3 

V 4 


= koXl + {-k c )x 2, 


k u x 3l 

= 6 ki 3 ^x 2 + 3k^x 2 x 3 + 3 k^x\ + ki^x 2 x\ + k^x 2X3, 

= Qk!f^x\l + 3k^x 2 x 3 l + 3fc| 2 ' 1 x 2 l + k[ 3 ^ X 2 X 3 l + k^a’ 2 X 3 I, 


where Vi are the reaction velocities for the variables Xi, and lowercase letters denote the concentrations of 
their uppercase counterparts. 

The model parameters for the cell death cluster model are 


a = {k 0 ,k c ,k u ,kf\kf\kf\k {3) ) 


and the variables are 


X = {i,Xl,X2,X 3 ) 


The outputs are 


V = (A,p,C) 


representing, respectively, the total ligand concentration, the total receptor concentration, and the total 
downstream “death signal”, as given by the equations 


A -£ = 
p-{xi+x 2 +x 3 ) = 

( — X 3 = 


7 


0 

0 

0. 


( 6 ) 

(7) 

( 8 ) 



We set the model equations (iq, aq,2:3) to zero, and, together with equations (6)-(8), we obtain the defining 
equations for the model variety Vm- The ambient space that Vm is contained in has dimension 14. This 
space has coordinates defined by both the model parameters a, the variables x and the outputs y. 

Given an observable data point y = (A, p, £), we define the data variety as: 

Vv = {(*, a, y) 6 C 14 : A = A, p = p, C = C} (9) 

for the clustering model. Note that Vx> has dimension 11 since there are no degrees of freedom in the 
variables A, p, or £. 

We first compute a numerical irreducible decomposition (NID) of Vm using Bertini; this will aid in 
understanding Vm H Vv- One can verify, after computing the NID for Vx, that Vm is a 9-climensional 
complex algebraic set of degree 10 (file name: Cluster_Model_NID ). 

Now suppose we are given the following data point (taken from the model without noise): 

y = (A, p, C) = (1.7784308,2.31883024, 2.16896112). 

One can then verify using the NID that Vm O Vv 7^ 0 (file name: Cluster_Model_Data_NID). That is, the 
intersection of the model variety and data variety is nonempty. Specifically, Vm H Vv is a 6-dimensional 
complex algebraic set of degree 5. Adding noise to the coordinates of y taken from _A/(0, 0.1) did not affect 
the dimension or degree. Again, we got that Vm O Vv has dimension 6 and degree 5. 

Since we are interested in model compatibility, our goal is to find at least one nonnegative point in 
(Vm)h n (Vx>)r. The above computation at least provides evidence that this is the case, but it may be 
possible that Vm H Vv does not contain any nonnegative real points or any real points at all for that matter. 
We will approach this problem using the methods described in [5]. If Vm H Vv contains a real nonnegative 
point then we cannot reject the model and may conclude that the model is compatible with the data. If 
(Vm)r l~l (Vx>)r does not contain a real point then we can try and use a more general Lagrange multiplier 
method similar to the one employed in Section 4.4.3 in dealing with model selection. 

We first randomly select a real, positive point: 


t = 6.491154749564521 
x 1 = 7.317223856586703 

x 2 = 6.477459631363067 
x 3 = 4.509237064309449 
k a = 5.470088922863450 
k c = 2.963208056077732 
k u = 7.446928070741562 
k (2) = 1.889550150325445 

k ( f> = 6.867754333653150 
k {2) = 1.835111557372697 

fcj 3) = 3.684845964903365 

where each coordinate is chosen uniformly on the interval [0,10]. This point will determine the observable, 
i.e. output, variables A,p, and £ using equations (6)-(8). Call this point (o*,x*,y*) 6 R 14 . For the time 
being, the left and right endpoints of each subinterval [0,10] have been chosen arbitrarily. It is unclear, at 
this time, how the endpoints or length of the interval affects the performance in finding nonnegative real 
points contained in (Vju)r fl (Vx>)r using the methods described below. 

Our aim then is to solve the constrained optimization problem: 


minimize \\(a,x,y) — (a* ,x* ,y*)\\ 2 

v 

subject to (a, x, y ) 6 (Vx)r O (Vd)r- 


( 10 ) 


Geometrically, we are minimizing the distance between the chosen point (a *, x *, y*) and (Vm)r O (Vx>)r. 



We will refer to the system defining (Vm) H (Vv) as f*(a,x,y). Squaring up the polynomial system 
will be a necessary step when utilizing the perturbed regenerative solving scheme. Squaring up was briefly 
described in Section 1.4 but we will give additional details here. First notice from previous computations 
that Vm FI Vv has codimension 14 — 6 = 8. Thus, there exists a nonempty Zariski open set A C C 8x9 such 
that for every matrix A G .4., we have Vm (~l Vv Q V(Af*(a,x,y)). This means we may take 8 random 
C-linear combinations of the equations defining Vm Cl Vx> and, with probability one, still cut out at least 
V m n Vv- It is sufficient to sample the elements of A uniformly along the complex unit circle. 

As a side note, we may take additional steps to reduce complexity by replacing the matrices A with 
matrices of the form [/ 8 | b] C A where J 8 denotes the 8x8 identity matrix and b denotes a 8 x 1 column 
vector who elements are sampled uniformly along the complex unit circle. This has the effect of adding a 
random multiple of the last function to each of the other functions. One may verify a posteriori if a point 
x € V(Af*(x)) is also in Vm 0 Vv by function evaluation of f*(a,x,y). 

The polynomial system for the optimization problem (10) is: 


hi(a, x, y) 

= 0, for 1 < * < 8, where hj(a,x,y) = £® =1 A jfc /j!(a,2:,y) with A jk = 

(11) 

ai - a* 

= E s d hj (a,x,y) x iotl<i<7 

3 <>"i 

(12) 

Xi - X* 

= E s d hj (a,x,y) x iotl<i<4 

OXi 

(13) 

vi - y* 

= E 8 d hj (a,x,y) A fori< i < 3 . 

' % 

(14) 

We call A = (Ai,.. 

., A 8 ) the Lagrange multipliers for Vm H Vv- This is a system of 22 variables 

and 22 

equations. Written more compactly we may write equations (11)—(14) as: 



h(a, x, y) = 0 

(15) 


(a, x, y) ~(a*,x*,y*) = J k (a,x,y)X T 

(16) 


where Jj l (a,x,y) is the Jacobian matrix of h,(a,x,y). The Lagrange multipliers in equations (16) need not 
be real since we are taking C-linear combinations of f*(a,x,y). 

Regeneration is a numerical algebraic geometry method we found to be most relevant to solve equations 
(15)—(16) and is implemented in Bertini. Regeneration uses a linear product homotopy scheme in which 
each equation is built up one at a time. Depending on the ordering and structure of each equation, that can 
lead to huge computational savings. However, regeneration is restrictive in that it may not find all singular 
isolated solutions to equations (15)—(16). For the finer details of regeneration see reference [2]. 

One small adjustment we can do to solve this problem is to first solve a slightly perturbed problem (this 
is the strategy employed in [5]). Indeed, there is a nonempty Zariski open set T such that for every 7 e T 
the solutions to: 


h{a,x,y)~ 7 = 0 (17) 

(a,x,y) T - (a*,x*,y*) T = J k (a,x,y)X T (18) 

will be nonsingular. The benefit to first solving this system is that regeneration will now find all solutions. 
After the solutions to equations (17)—(18) have been computed, one can use a parameter homotopy to 
compute all isolated solutions (15)—(16), which may contain singular solutions. 

We briefly describe the parameter homotopy employed following regeneration. The solutions of equations 
(17)-(18) lead to the solutions of equations (15)—(16) through a collection of homotopy paths where each 
homotopy path as functions of t are solutions to the straight-line homotopy function: 


H(a,x,y,t ) 


h(a, x, y) - fy 

(a, x, y) T - (a*, x*, y*) T - J', f (a, x, y)\ T 


for t 6 (0,1] cl. As t- —> 0, we obtain numerical approximations to the solutions of equations (15)—(16). 
Additional details on parameter liomotopies can be found in [1, 2]. A basic implementation of parameter 
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Table 1: Expected timings collected over 20 runs. The table includes the average time and standard 
deviations associated to the four computations described in this section. 



Timing 

Compute Vm 

0.74 

sec ±0.09 sec 

Compute Vm H Vt> 

0.38 

sec ±0.04 sec 

Regeneration (parallel) 

6.09 

sec ± 0.61 sec 

Parameter Homotopy 


0.03 sec 


Table 2: Nonnegative real solutions to (V.m)r fl (Vx>)m 



Solution 1 

Solution 2 

l 

1.7784308 

1.7784308 

Xi 

0.0545838 

0.0141547 

X 2 

0.0952853 

0.1357144 

X 3 

2.1689611 

2.1689611 

A 

1.7784308 

1.7784308 

P 

2.3188302 

2.3188302 

c 

2.1689611 

2.1689611 

k a 

5.3966315 

0.1924532 

k c 

3.0914404 

3.2734881 

ku 

3.9540082 

0.2856796 

J2) 

1.9881072 

1.2768451 

1,(3) 

7.6931353 

6.9985113 


1.9131209 

1.8363315 


3.6997123 

3.6848663 


homotopies is found in Bertini. Input hies for the regeneration and parameter homotopy runs may be 
found in the hies Cluster_Stepl and Cluster_Step2. 

Timing summaries for the clustering model can be found in Table 1. In all cases, we have employed 
the use of intrinsically dehned variables (see Appendix F.1.2 of [2]). These timings include computing the 
numerical irreducible decomposition of Vm, the numerical irreducible decompsition of Vm hi Vt>, computing 
the solutions to equations (17)—(18) using regeneration, and computing the solutions to (15)—(16) using the 
parameter homotopy. We found it most appropriate to utilize Bertini in serial for each computation except 
for regeneration which was done in parallel. Serial runs were done using a Apple MacBook Pro with 2.4 
GHz Intel “Core i5” processor. Parallel runs were done using 24 (2.67 GHz Xeon-5650) compute nodes on 
the CentOS 5.11 operating system. In total, after reviewing the solutions, there are three solutions that 
correspond to real points contained in (V»)r H (Vx>)r. Among the three real solutions, two solutions are 
nonnegative. Solutions are listed in Table 2. One can verify that these are indeed solutions to (Vm)rG (Vu)r 
by function evaluation of f*(a,x,y). 

We may conclude from these computations that the clustering model Vm is compatible with the observ¬ 
able data y. It may be the case that there is a very large number of data points, y , for which we would 
like to determine model compatibility. Fortunately, by employing a parameter homotopy scheme we may 
solve this problem rapidly where given each data point, the computations will be on the same order as the 
parameter homotopy solve in Table 1. 

In summary the steps for model compatibility for the clustering model are as follows: 

1 . Determine the dimension of Vm l~l Vt> using the NID. 

2. Using the information gathered in Step 1, select a random point whose coordinates are sampled 
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Table 3: Each reaction described highlights whether the reaction is a forward or reversible reaction by the 
arrows. Here i = 1,2. 


monomer-dimer (Mi) 

dimer-dimer (M 2 ) 

single operator (M3) 

Xi ^4 X, + Pi 

X, ^5 ^ + Pi 

X, ^4 Xi + P, 

P, ^>0 

Pi ^4 0 

Pi-^4 0 

2 P 2 4^ P2P2 

2 p 2 4 ^ P2P2 

2 p 2 4^ p 2 p 2 

kkR 

kkR 

kkR 

Xl + P2P2 4^ X1P2P2 

Xi + P2P2 4^ X1P2P2 

x 2 + p 2 p 2 4^ X 2 P 2 P 2 

k n R 

k n R 

kqR 

X 2 + Pi^ X2P1 

k C R 

x 2 + Pi Pi 4^ X2P1P1 

k 0 R 

2 Pi 4 ^ Pi Pi 

k L R 

X 2 P 2 P 2 X 2 P 2 P 2 + P 2 


uniformly among a nonnegative closed interval and set up equations (17)-(18). 

3. Solve the perturbed equations (17)—(18) from Step 2 using a regeneration scheme, for example by 
setting USEREGENERATION to 1 in Bertini. 

4. Solve equations (15)—(16) using a parameter homotopy. Starting solutions are among the solutions 
gathered in Step 3. 

5. Filter the solutions gathered in Step 4 to determine if there are nonegative real solutions. 

6 . We conclude that the model variety V vi is compatible with the data since there is at least one solution 
found in Step 5. 

4.2 Synthetic biology and experimental design 

We demonstrate an example from synthetic biology with excess intersection (dim(V_A^ fl Vx>) > 0). We 
compare three bistable bio-circuits models analyzed in [12]: monomer-dimer toggle circuit (.Mi), dimer- 
dimer toggle circuit (.M 2 ), and single operator gene circuit (.M 3 ), which were initially presented in [13-15]. 
The model variables include concentrations of genes (X;) and proteins (P ; ) where i = 1, 2 as well as species 
complexes of the form .V Pi Pi. 

We follow the same notation for variables and parameters as presented by [12]. The reactions governing 
each of the models are given in Table 3. 

These variables interact following mass-action kinetics and form systems of polynomial differential equa¬ 
tions where Mi, M 2 , and M 3 have 7, 8 and 6 model variables, respectively, and 10, 12, and 9 kinetic 
parameters, respectively. The models can be reduced by assuming that the total amount of gene 1 (Xi tot ) 
and gene 2 (X 2 tot ) is conserved and these polynomial systems for each model are as follows. For simplicity, 
we use Pn and P 22 for P\P\ and P 2 P 2 , respectively. 

The monomer-climer toggle circuit (.Mi) system is: 


— kdeglPl — k c FX 2 Pl + kbaslXl + k c fl(X 2tot — X 2 ) = 0 
-2k kF Pl — kdeg2P2 + 2fcfcflP22 + fcbas2X 2 = 0 
k kF P 2 — kkB.P 22 — k n Fp 22 Xl + k nF i(X i tot — Xl) = 0 
k n R (Xi tot — Xl) — k n F P 22 X 1 = 0 
k cR (X 2 tot — X 2 ) — k c pPi X 2 = 0. 
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The model dimer-dimer toggle circuit (M 2 ) system is: 

— 2kiFPi — kde gi Pl + 2fc;i?Pn + kbaslXl = 0 

kiFPf ~ kiRPil ~ k 0 FPnX 2 + k oR (X 2tot — X 2 ) = 0 
— 2 kkFP 2 — kdeg 2 P 2 + ‘2kk R P 22 + kbas 2-^2 = 0 
kkFP 2 ~ kkRp22 + k n Fp 22 Xi + k nR (Xi tot — Ad) = 0 
k n R (Xi tot — Xi) — k n FP 22 Xi = 0 
k 0 R(X 2tot — X 2 ) — k 0 FPnX 2 = 0. 

The model single-operator positive feedback circuit (M 3 ) system is: 

kbas 2 X 2 — kdeg 2 P 2 ~ 2+ 2 kkR.P 22 + k w (X 2tot — X 2 ) = 0 
kkFP 2 — kkRp 22 — k q pP 22 X 2 + k qR (X 2tot — X 2 ) = 0 
kq R (X 2 tot — X 2 ) — k q FP 22 X 2 = 0 

In this example, we suppose that the total amounts Xi tot and X 2tot and specific protein synthesis 
and degradation parameters kb aSl , kb a s 2 > kd egi , and kd eg2 are known and we assume that our data are 
measurements of Pi, P 2 , and their complexes P \\, and P 22 , i.e. y = (Pi, P 2 , Pn, P 22 )- The aim is to 
select the best model Mi, M 2 , and M 3 given the data. We simulate steady-state data ( Pi, P 2 , Pn,P 22 ) = 
(0.4224, 2.4153, 0.9022, 0.4758) from the dimer-dimer toggle model (M 2 ) using the following parameter and 
variable values: 


parameter 

value 

parameter 

value 

X lt , 

1.2099 

k„F 

1.3566 

X 2f f 

tot 

2.0660 

k n R 

0.6521 

kbasi 

0.8718 

koF 

1.5169 

fobas 2 

1.6930 

koR 

1.0661 

kdegi 

1.2550 

Kf 

3.3169 

kdeg 2 

0.6341 

ki R 

0.6559 

kkF 

0.6580 

kqF 

0.5057 

kkR 

8.0681 

kqR 

0.4844 

kcF 

0.4675 

k w 

0.1478 

k cR 

1.1636 




We add Gaussian noise from A/”(0,0.1) and then find dim(Vx, n Vx>) for i = 1,2,3. We can compute 
the dimension of each intersection using the dim command in Macaulay2 or by computing a numerical 
irreducible decomposition in Bert ini; we find: 

dimfkxj n (Vv) = 3, 
dim(VA 4 2 n Vt>) = 4, 

dim(Vx 3 n (Vx>) = 3. 

Some further computations are required to find dim((Vx i )Rn (Vd)r. Specifically, we need to find real points 
in each intersection and determine whether or not those points are smooth. Computing the dimension of 
the real part of the intersections is more work than necessary for Algorithm 2, however, it provides an 
illustrative example on how to work with real varieties and the algorithm in [5]. 

Let /M = 0 be the polynomial system defining Vm, H Vv for i = 1,2,3 and let u/ 1 ) 6 R 17 , 6 R 20 , 

w/ 3 ) 6 R 15 be random points. Let be the vector of indeterminates (unknown parameters and variables) 
for *th model, and let q be the codimension of Vm, i~i Vv- We can find a real point on every component of 
each interesection, by solving the system: 

/ (i) = 0 , 

ArVA (i) + ... Ac,V/W + ( X M - u>«) = 0. 
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(19) 

( 20 ) 



This is a simplified version of the system in Theorem 5 from [5]. As a remark, notice the similarity of 
the (19)-(20) to the system in Theorem 1. Algorithms for finding real points on a variety have been built 
on algorithms for minimizing the distance between a point and a variety since [16]. 

Once we have a real point on every component oIVm^Vv, we can quickly determine dim(V_A 4 i )Kn(Vx>)]R 
if those real points are smooth. Indeed, if V is an irreducible variety, then dim V = dim Vr if V contains a 
real smooth point (see [2, §14.1]). Checking whether a real point is smooth can be done by evaluating the 
Jacobian Vjk, n Vv at the point; if the Jacobian has full rank, then the point is smooth. In our case, for 
the three models, every point we find is smooth and thus we are able to reach the conclusion: 


dim((V» 1 )R H (Vd)r) = 3, 
dim((V. m 2 )r fl (Vd)r) = 4, 
dim((V» 3 )R H (Vd)r) = 3. 

The dimension analysis of the varieties Vm, H Vv informs us about the minimum number of additional 
variable and parameter values that must be measured to ensure V» fl Vv — 0- For Mi we need to know at 
least 4 more variable and/or parameter values, for M 2 we need to know at least 5 more, and for M 3 we need 
to know at least 4 more. Thus for the remainder of the example, we assume that we the parameters k c F , 
k c u , k n F, and kkF are known in Mi, the parameters kkF, k n F , kiF , k 0 p , and k 0 R are known in M 2 , and 
the parameters kkF, k q F, k q v and k w are known in M 3 . The model M 3 is an example where the number 
of additional parameters and/or variables that need to be known/measured exceeds the amount predicted 
by the dimension analysis. 

Now that all the intersections are empty, we run Algorithm 2, using the regeneration methods in Bertini 
to solve the systems resulting from Theorem 1. We find that the sum of squares (Eq. (3.1)) for each model 
are as follows: d\ = 2.116, d\ = 0.000124, and d 3 = 0.6333. Therefore, we select the M 2 model, which 
matches the model from which the data was generated. 

Solving the zero-dimensional system for the monomer-dimer toggle circuit, Mi, took 1 minute and 3 
seconds on an Apple MacBook Pro with a 2.6 GhHz Intel Core i5 processor. Solving the system for the 
dimer-dimer toggle circuit, M 2 , took 1 minute and 43 seconds, and solving the system for the single-operator 
positive feedback circuit, M 3 took 0.092 seconds. 

4.3 Epidemiology HIV 

To demonstrate parameter estimation we use a model that includes long-term HIV dynamics from ini¬ 
tial virus, latency, and virus increase [17], based on [18]. Model variables x are uninfected CD4+T cells 
(T), infected CD4+ T cells (7)), uninfected macrophages (47), infected macrophages (47,), and HIV virus 
population ( V ). The reactions are considered for this model are shown in Table 4. 

From these reactions, the dynamics are described by the following equations: 


T = si + kiTV - k 2 TV - SiT 
7) = k 2 TV - 5 2 Ti 
47 = s 2 + k 3 MV - k 4 MV - S 3 M 
Mi = k 4 MV - S 4 Mi 
V = k 5 Ti + k 6 Mi - S 5 V 

Using Macaulay2, we find that the model variety Vm has two irreducible components, the main compo¬ 
nent Vi defined by the ideal 

h = (5742M - 2453Mj - 130500,2599087) - 4660747, + 4840000J 5 - 20200500, 

17721T + 4660747, - 4840000<5 5 + 2479500,484000V<5 5 - 18454747, + 4840000<5 5 - 20200500, 
245347V - 7260047, + 130500V) 
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Table 4: Reactions for HIV model. The published parameter value is used from [17], see references therein. 


Description 


Reaction 


Parameter value 


10 


Generation of new CD4+T cells 

Generation of new macrophages 

Proliferation of T cells by presence of pathogen 

Infection of T cells by HIV 

Proliferation of M by presence of pathogen 

Infection of M by HIV 

Proliferation of HIV within CD4+T cell 

Proliferation of HIV within macrophage 

Natural death of CD4+T cells 

Natural death of infected T cells 

Natural death of macrophages 

Natural death of infected macrophages 

Natural death of HIV 


0 M 

1.5 x 10 -1 

T + V-^{T + V)+T 

2 x 10“ 3 

T + V^T, 

3 x 10 “ 3 

M + V % (M + V) + M 

7.45 x 10 “ 4 

M + V % Mi 

5.22 x 10 “ 4 

Ti V + Ti 

5.37 x HT 1 

Mi % V + Mi 

2.85 x 10 _1 


0.01 

Ti ^ 0 

0.44 

M ^ 0 

6.6 x 10“ 3 

Mi ^ 0 

6.6 x 10“ 3 

V^>0 

3 


and an extinction component V 2 defined by the ideal 

I 2 = (V, Mi, 11 M - 250, Ti, T - 1000) 

We estimated the natural death of the virus, parameter <$ 5 , using Algorithm 3 with main component Vi 
in place of Vm (see Section 3.2). For Vt>, we used the long-term non-progressors steady-state value (Table 
3 of [17]) and added noise to each variable ~ A/”(0,1). In particular, the data variety Vv is defined by the 
equations: 


T- 


Ti - 


M - 


6383 

20 ” 

937 

20 

8109 


Mi - 


V - 


100 

13667 

100 

2121 


100 


0 

0 

0 

0 

0 


For si,s 2 ,ki,... ,k 6 ,Si,. 
The varieties Vi and 


.. , 84 , we treated these parameters as known using the values from Table 1 of [17]. 
Vt> do not intersect, which we can confirm with Macaulay2. Using Bertini, we 
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solve the following system from Theorem 1: 


5742A7 - 2453 M, - 130500 = 0 

259908T) - 46607Mj + 4840000<5 5 - 20200500 = 0 

17721T + 46607M, - 48400005s + 2479500,484000K5 5 - 184547M; + 48400005s - 20200500 = 0 
2453 MiV - 72600 Mi + 130500V" = 0 
T+ 17721A 3 - 6383/20 = 0 
Ti + 259908A 2 - 937/20 = 0 
M + 5742Ai - 8109/100 = 0 

2453A 5 + Mi - 2453Ai - 46607A 2 + 46607A 3 - 184547A 4 - 72600A 5 - 13667/100 = 0 
4840005 5 A 4 + 2453MiA 5 + V + 130500A 5 - 2121/100 = 0 
4840001/A 4 + 4840000A 2 - 4840000A 3 + 4840000A 4 = 0 

There are 16 complex solutions to this equation, 3 of which are real. The real point resulting in the smallest 
sum of squared errors cP = 0.2884 is: 

( T , Ti, M, Mi, V, 65 , Ai, A 2 , A 3j A 4 , A 5 ) = 

(319.408,46.404,81.1544,136.767,21.3079,2.99876, 

- 0.0000112074,0.00000171594, -0.0000145481, -0.00000519486,0.0000159701) 

Thus, Algorithm 3 returns 5s = 2.99876, which we can compare to the true value 5 5 = 3. 

4.4 Multisite phosphorylation 

Here we describe the details for the multisite phosphorylation model selection and parameter estimation 
computations. First we describe the relevant biology, next we present the mathematical models of the 
distributive and processive mechanisms, then we apply our model selection method using data from [19]. 
We also estimate the relationship between the EGF concentration and activation of the pathway described 
by the parameter ki (see Table 11). 

4.4.1 Biology of MAP Kinase system 

Many cellular decisions are governed by molecular post-translational modifications. One type of modifica¬ 
tion, phosphorylation, is the addition of a phosphate group to a site of a substrate by an enzyme called 
a kinase. Some proteins (substrates) require multiple phosphate groups to be added by the kinase before 
the protein in activated/de-activated by these modifications. One well-studied signaling system is the MAP 
Kinase pathway, with kinase MEK and its substrate ERK; however the mechanism by which the phosphate 
group is added has been debated. Either MEK could phosphorylate ERK, disassociate and then phospho- 
rylate again, called distributive', or MEK could bind and phosphorylate in sequence, called processive. Aoki 
et al [19] showed experimentally (with mathematical models) that this mechanism is different in vitro than 
in vivo. This experiment included 12 different levels of EGF stimulus ranging from 0.0244140625 ng/mL to 
50 ng/ML. EGF actives cRAF which then phosphorylates MEK and finally doubly phosphorylates ERK. 
The data are measurements of three replicates of nonphosphorylated ERK (np-ERK), tyrosine monophos- 
phorylated ERK (pY-ERK), and doubly phosphorylated ERK (pTpY-ERK), at each stimulus level. These 
data are given as percentage of total ERK (ERK), so we use the concentration measurement for each of 
these ERK states. 

4.4.2 Mathematical models 

The model variables and parameters are given in Table 5. The model parameters for the distributive model 
are 

a = (fci, ■ ■ •, k 2 r, ci, c 2 ), 
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Table 5: Description of variables and parameters for distributive and processive MAP Kinase models 


variable 

species 

parameter 

name 

parameter 

name 

*1 

MEK 

fci 

kphos JMEK _pMEK 

fel5 

kdphos_pY_np_cyt 

*2 

cRAF 

kg 

kdphos _pMEK _MEK 

fel6 

kdphos_pT_np_cyt 

*3 

pMEK 

k 3 

kf MEK ERK binding 

fel7 

kdphos_pTpY_pY_nuc 

*4 

np-ERK _cyt 

&4 

kb_MEK_ERK_dissociation 

fel8 

kdphos_pTpY_pT_nuc 

*5 

MEK_np-ERK 

h 

kimport_np 

fel9 

kdphos_pY_np_nuc 

*6 

np-ERK nuc 

k 6 

kexportmp 

k -20 

kdphos_pT_np_nuc 

*7 

pY-ERK evt 

k 7 

kinrportqpY 

fe21 

kphos_np_pY 

x 8 

pY-ERK nuc 

k 8 

kexport_pY 

fe22 

kphos_pY_pTpY 

*9 

pT-ERK_cyt 

kg 

kinrport_pT 

fe23 

kphos_pT_pTpY 

*10 

pT-ERK_nuc 

kio 

kexport_pT 

k 2 A 

kf MEK ERK binding 

*11 

pTpY-ERK_cyt 

fen 

kimport _pT pY 

fe25 

kb_MEK_ERK_dissociation 

*12 

pTpY-ERK_nuc 

fe’12 

kexport_pTpY 

fe26 

kphos_np_pY 

*13 

pMEK_np-ERK 

fe’13 

kdphos_pTpY_pY_cyt 

fe27 

kphos_pY_pTpY_MEKERK 

*14 

pMEK_pY-ERK 

fcl4 

kdphos pTpY pT cyt 

C2,Cl 

cyt_vol, nuc_vol 


Table 6 : Reaction velocities for the MAP Kinase distributive and processive model. The processive model 
uses the additional reaction velocities *i8 ; *i9;*20- 


V\ = fci*i* 2 - k 2 x 3 

*2 = fe3*l*4 — fe’4*5 

*3 = fe-5*4 - C 2 fe6*6 

v 4 = fc 7 * 7 - c 2 k 8 x 8 

*5 = kgXg — C2fel0*10 

V 6 = fen*n - C 2 fel2*12 

V7 = fel3*ll 

*8 = fel4*ll 

Vg = fci 5 *7 

*10 = fel6*9 

*11 = C 2 fel7*12 

*12 = C 2 fel8*12 

*13 = C 2 fcl9*8 

*14 = C2fe’20*10 

*15 = fe21*3*4 

*16 = fe22*3*7 

*17 = k 23 X 3 Xg 


*18 = fe24*3*4 - fe25*13 

*19 = fe26*13 

*20 = fe27*14 


the variables are 

* = (xi,. .., *12 , cRAF tot , MEK tot , ERK tot ), 

and the outputs are 

y = (np-ERK, pY-ERK, pYpT-ERK). 

The variables for the processive model are the same as for the distributive model except for two additional 
variables * 13 , * 14 . The reaction velocities are given in Table 6 and the corresponding equations are given in 
Table 7. Note in Table 7 that there are various conserved species concentrations in addition to the ODEs. 

We use the in vitro parameters estimates from Table S2 in reference [19] for k- 2 , ■ ■ ■, ci, C 2 and the 
conserved quantities MEK tot , cRAF fot , ERK tot , as given in Table 8 . The remaining parameter, k±, describes 
the rate of MEK phosphorylation and depends on the level of EGF stimulation, which varies throughout 
the data. Thus, we left it as a free variable and estimated it as a byproduct of distance minimization (11). 
The output variables are np-ERK, pY-ERK, and pYpT-ERK, which are sums of species concentrations. 
For the distributive model, the output equations are: 


np-ERK — (* 4 + * 5 + * 6 ) 

= 0 

( 21 ) 

pY-ERK — (x 7 + * 8 ) 

= 0 

( 22 ) 

pYpT-ERK — (*n + * 12 ) 

= 0 

(23) 
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Table 7: Equations for distributive and processive MAP Kinase models 


Variable 

Distributive 

Processive 

±i = 

-Vl - v 2 

-Vl - v 2 

x 2 = 

0 

0 

X 3 = 

Vl 

Vl - Vis + V20 

±4 = 

• V‘2 ~ V 3 + Vg + Cio - V 15 

-v 2 - Vz + Vg + v w - Vis 

is = 

V 2 

V 2 

X 6 = 

V3 + V i3 + Vu 

V 3 + V13 + Vi 4 

X 7 = 

-Vi + V 7 - Vg + V15 - Wi 6 

—V 4 + V 7 ~Vg~ Vie 

*8 = 

Vi +Vn - Viz 

v A + Vn - V 13 

Xg = 

-V 5 + v s - Vio - Vn 

-v 5 + v 8 - vio - vn 

XlO = 

V 5 + Vl 2 - V 14 

V 5 + V12 - V 14 

in = 

-v 6 - v 7 -V 8 + vis + Vn 

—vq — v 7 — vs + Vi6 + V17 + V20 

il2 = 

v e ~ vn - vn 

v 6 - vn - vn 

ii 3 = 


Vis - Vig 

ii4 = 


Vl9 — V20 

0 = 

MEK tot - {xi + x 3 + x 5 ) 

MEK tot — (,ti + x 3 + x-e + X 13 + xn) 

0 = 

cRAF tot - x 2 

cR AF tot - x 2 

0 = 

ERI< tot -Ei= 4 *i 

ERK tot - Yjiti x-i 


Table 8: Parameter values for MAP Kinase models 


parameter 

value 

parameter 

value 

parameter 

value 

k 2 

0.0096 

ki3 

0.004 

k 2 i 

0.18 

k 3 

0.18 

k u 

0.0055 

k 2 5 

0.27 

hi 

0.27 

kn 

0.0067 

k 2 e 

0.073 

k 5 

0.0017 

kie 

0.0068 

k 27 

0.05 

k 6 

0.013 

kn 

0.0032 

Cl 

1.0 

k 7 

0.0025 

kis 

0.0038 

C 2 

0.2 

ks 

0.017 

kig 

0.0077 

c.RAF tot 

0.013 

kg 

0.0022 

k-20 

0.0058 

MEKtot 

1.2 

kio 

0.049 

k- 2 i 

0.039 

ERK( 0 t 

0.74 

kn 

0.0082 

k 22 

0.021 



k\ 2 

0.0076 

k-23 

0.02 
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whereas for the processive model, we include two additional species, so the output equations become: 


np-ERK — (x4 + X 5 + x e + £13) 

= 0 

(24) 

pY-ERK — (X7 + xg + X14) 

= 0 

(25) 

pYpT-ERK — (aqi + x 12 ) 

= 0. 

(26) 


4.4.3 Model selection and parameter estimation computations 

The model variety of the distributive model is defined by (21) - (23) and the equations obtained by 
setting the “Distributive” column of Table 7 equal to zero. We will refer to the system defining V_M d as 
F = 0 . The model variety Vm p of the processive model is defined by (24) - (26) and the equations obtained 
by setting the “Processive” column of Table 7 equal to zero. The ambient dimension of \ > M d is 16 since the 
coordinates that define V_M d include x\,... ,x± 2 , np-ERK, pY-ERK, pYpT-ERK, and the model parameter 
fci; all other parameters and variables we treat as known constants. Similarly the ambient dimension for 
the processive model is 18 as we include the additional variables X 13 and X 14 . 

Given data y = (np-ERK, pY-ERK, pYpT-ERK) we define the data variety for the distributive model 
as: 


Vv d = {(a, x, y) E C 16 : y = y] 

The data variety Vv d has dimension 13. The specific data that we used may be found in the supplemen¬ 
tary data hie (aoki_data.txt). The data variety Vv p for the processive model is dehned similarly. 

The computations that follow will be for the distributive model. The computations for the processive 
model will be nearly identical so we do not describe them in the same level of detail. When the results are 
discussed we will be sure to record information for both models. 

We first compute a numerical irreducible decomposition (NID) of using the Bertini.m2 [20] the 
Macaulay2 interface for Bertini to solve [ 8 , 20, 21]. With the NID, one can verify that is a one¬ 

dimensional complex algebraic set of degree 8 (filename: MAPKYLModel _NID). Similarly for the proces¬ 
sive model, the model variety V» p is a one-dimensional complex algebraic set of degree 11 (filename: 
MAPK_P_Model_NID). There are several variables that may be intrinsically dehned to save computation. For 
example, xi,X2,xy , and in can be written in terms of the other variables followed by X4. One may also 
verify that fl Vx> d = 0 using Bertini (hlename: MAPK_D_Model_Data_NID). Here, one can also dehne 
the variables np-ERK, pY-ERK, and pYpT-ERK intrinsically to save computation. Since Vvf d fl Vp d = 0 , 
Algorithm 2 instructs us to minimize the distance between (Vm^r and (Vpq)]®. 

Squaring up the polynomial system defining V)\ 4 d will be a necessary step to construct the polynomial 
system from Proposition 1. This procedure was described briefly in Section 1.4 and in more detail in Section 
4.1. The codimension of Vj^ d is c = 16 — 1 = 15, the dimension of the ambient space minus the dimension 
of VM d as determined by the numerical irreducible decomposition. Let A 6 C 15x1 ' whose entries are taken 
randomly from the complex unit circle. The polynomial system from Proposition 1 becomes: 


0 

0 

Vi - m 


0, for 1 < i < 15, where /j( x) = T} k=l A jk F k {a,x,y) with A jk = [A\^ k 


E 15 

df' 

(a, x. 

• y) 

A 





L 

da\ 


Aj, 




^15 

2 j j= : 

df\ 

(a, x. 

A 





J 3 

L 

dxi 

A ji 

for 

1 < '< 

i < 12 

^15 

df' 

(a, x. 

y) 





j j 

L 

dyi 

A ji 

for 

1 < '< 

i < 3 


(27) 

(28) 

(29) 

(30) 


The variables A = (Ai,..., A 15 ) are the Lagrange multipliers. This is a system of 31 variables and 31 
equations. We collect the solutions (a, x, y , A) 6 Rx R 12 x R 3 x C 15 to equations (27)-(30) and then compute 
||y — y H 2 for each solution. 

In Section 3.1, we explained an issue that can arise in solving constrained optimization problems such 
as ones arising from problem (l)-(5). In this example, we want to ensure that X \,..., X 12 , ai, y\ 1 1 / 2 , Vz are 
non-negative, i.e. SM d = V M d D R>o- To minimize the distance between SM d and Vx> d using a numerical 
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Table 9: Path counts on processive and distributive models, ‘{(a, x, y ), A}-hom’ corresponds to a {(o, x, y, A}- 
homogeneous variable grouping and ‘intrinsic x 2 corresponds to the system where X 2 is intrinstically defined. 



Total Degree 

{(<T x, y), A}-hom 

{(a, x, y), A}-hom + intrinsic *2 

Processive Model 
Distributive Model 

124416 paths 
248832 paths 

3744 paths 

7488 paths 

1152 paths 

2304 paths 


algebraic geometry approach, we should solve the system (27)-(30), and then solve the system again 16 
more times, setting one of xi,... ,x\ 2 ,a\,y\,y 2 ,y 3 to zero each time. We know that this will be sufficient 
to check the boundary, since the intersection of with any coordinate hyperplane is zero dimensional. 

The data set we used can be found in the file (aokidata.txt). There are 36 sets of data points where 
each set consists of a triple y = (np-ERK, pY-ERK, pYpT-ERK). Each data point defines a data variety, 
which we will denote (Vu^r for the ith data point. 

The polynomial system (27)—(30) is a classic example of a parameterized system of polynomial equations 
with parameter y (notice that this is different than the rate parameters of the ODE). Let x,y) be the 

Jacobian matrix of f'(a , x , y). The theory for parameter homotopies states that there is a nonempty Zariski 
open set U C C 3 such that for every y* 6 U the nonsingular solutions of: 


f'(a,x,y) 

0 1 


y 


T 


(y* 


0 

jj,(a,x,y)\ T 


(31) 

(32) 


lead to solutions of equations (27)-(30) by homotopy paths. Each honrotopy path is the set of zeros of the 
straight-line homotopy function of t 6 (0,1] as t varies from 1 to 0: 


H{x,t) 


f(a,x,y ) 

0 

y T ~ (t(y*) T + (1 - t)y T ) 


Jj*{a,x,y) X T 


As t —> 0, we obtain numerical approximations to the solutions of equations (27)—(30). 

The benefit of employing a parameter homotopy is that after solving equations (31)-(32) with a more 
general method a priori, we significantly reduce the computation required in solving equations (27)-(30) 
for each i. In practice, the entries of y* are sampled uniformly along the complex unit circle. More details 
on how parameter homotopies fit into numerical algebraic geometry can be found in [1] and [2]. 

In addition to employing a parameter homotopy solving scheme, equations (31)—(32), or equally equations 
(27)-(30), have a natural {(a, x, y), A}-homogenous structure (see [1] and [2]). This observation significantly 
reduces the number of homotopy paths that need to be tracked numerically. In addition, this increases 
stability of path tracking. Multihomogenous structures are used alongside parameter homotopies to solve 
equations (27)-(30). 

One additional reformulation that we can do to reduce computation is to define some of the variables 
intrinsically. This is common if one or more variables can be written as a linear combination of some of the 
other variables. Specifically, we know from Table 7 that: 


x 2 = cRAF tot 


where cRAF to( is defined as a constant in Table 8. Thus, we can “remove” X 2 from our computations. 
Partial derivatives are no longer necessary with respect to the variable X 2 , and X 2 is no longer defined 
explicitly when tracking homotopy paths. 

Table 9 summarizes the sequence of reductions made in the number of paths by imposing a {(a, x , y), A}- 
homogeneous structure followed by intrinsically defining the variable X 2 along with the number of paths 
required using the standard total degree homotopy [1], [2]. 
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Table 10: Expected timings for the MAPK model collected over 20 ‘random’ runs. 



Compute Dimension 

Initial Solve (parallel) 

Data Solve (all 36) 

Processive Model 
Distributive Model 

9.14 sec ±1.21 sec 
13.54 sec ±1.08 sec 

28.05 sec ± 3.24 sec 
53.06 sec ± 9.20 sec 

10.95 sec ±2.09 sec 
19.09 sec ±4.84 sec 


Table 12 and Table 13 record the distances between the data and model varieties for all 36 data points. 
A missing “interior” distance in Table 12 and Table 13 indicate there were no positive real critical points 
found for the given EGF level and replicate. However, we may still compute a distance to the boundary of 
the semi-algebraic sets corresponding to each model. Bert ini input files, shell scripts, and MATLAB scripts 
are available within the supplementary files to analyze model selection and parameter estimation. The 
distances are summarized graphically in Figure SI. 

Timing summaries for both the processive and distributive model can be found in Table 10. These timings 
include the numerical irreducible decomposition required to compute the dimension of each component in 
the model variety, solving equations (31)—(32) required to employ a parameter liomotopy scheme, and the 
parameter homotopy to solve equations (27)—(30) for all i. Timings to compute the dimension of the model 
variety and the data solve were done in serial using a Apple MacBook Pro with 2.4 GHz Intel “Core 
i5” processor. The initial solve for the parameter homotopies were done in parallel using 96 (2.67 GHz 
Xeon-5650) compute nodes on the CentOS 5.11 operating system. 


Figure SI: Distance Plot 
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Table 11: Parameter estimate of k\ for distributive and processive MAP Kinase models 


EGF level 

Distributive 

Processive 

1 (0.0244140625 ng/mL) 

0.006655185015566 

0.002630893498837 

1 

0.005169208080985 

0.002666996926268 

1 

0.010517845922915 

0.004869916688582 

2 (0.048828125 ng/mL) 

0.010599752816972 

0.004139244229281 

2 

0.005294859090859 

0.002185712163548 

2 

0.012645415710605 

0.005598240936340 

3 (0.09765625 ng/mL) 

0.013040547423470 

0.005555450037129 

3 

0.007862190037618 

0.003676690633723 

3 

0.007862190037618 

0.010890090940375 

4 (0.1953125 ng/mL) 

0.022314241866226 

0.007566455646161 

4 

0.014767039426564 

0.010026925643431 

4 

0.032112677267837 

0.014358973830327 

5 (0.390625 ng/mL) 

0.057037089355901 

0.028188983627261 

5 

0.034598433900385 

0.018615955020805 

5 

0.046993978170041 

0.023610675947125 

6 (0.78125 ng/mL) 

0.171132616846834 

0.081810937526556 

6 

0.108600436914432 

0.052541291660911 

6 

0.128469450822607 

0.062127115025643 

7 (1.5625 ng/ML) 

0.552602449693322 

0.311829951745710 

7 

0.198177806441869 

0.094130793284512 

7 

0.307630980846653 

0.162410456627761 

8 (3.125 ng/ML) 

1.535918937663066 

1.104298831092584 

8 

1.558792683503375 

0.653311235583287 

8 

1.114642498639051 

0.700052847271085 

9 (6.25 ng/ML) 

0 

0 

9 

6.741089632275663 

2.682148283074403 

9 

0 

0 

10 (12.5 ng/ML) 

0 

0 

10 

0 

0 

10 

2.556601780467115 

1.856045810178439 

11 (25 ng/ML) 

0 

0 

11 

0 

0 

11 

0 

0 

12 (50 ng/ML) 

0 

0 

12 

0 

0 

12 

0 

0 
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